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In this paper we give optimal constants in Talagrand's concen- 
tration inequalities for maxima of empirical processes associated to 
independent and eventually nonidentically distributed random vari- 
ables. Our approach is based on the entropy method introduced by 
, Ledoux. 

Oj 

c~| ■ 1. Introduction. Let Xi,X2,--- be a sequence of independent random 

variables with values in some Polish space X and let S be a countable class 
of measurable functions from X into [—1, l] n . For s = (s , . . . , s 11 ) in S, we 
set 

(1.1) S n (s) = s 1 (X 1 ) + --- + s n (X n ). 

—l. ■ In this paper we are interested in concentration inequalities for Z = sup{S* n (s) : s £ 

OV S}. 

I Now let us recall the main results in this direction. Starting from concen- 

tration inequalities for product measures, Talagrand (1996) obtained Ben- 
{J^ | nett type upper bounds on the Laplace transform of Z via concentration 

■ inequalities for product measures. More precisely he proved 

(1.2) logEexp(tZ) < m{Z) + Vab~ 2 {e bt - bt - 1) 
for any positive t. Here 



X 



V = E su P X> fe m) 2 • 



^ e5 k=i 



In order to analyze the variance factor V, set 
(1.3) V n = sup Var S n (s) 

s&S 
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Then, one can derive from the comparison inequalities in Ledoux and Ta- 
lagrand (1991) that V n < V < V n + 16E(Z) [see Massart (2000), page 882]. 
Consequently V is often close to the maximal variance V n . The conjecture 
concerning the constants is then a = 6 = 1. The constant a plays a fundamen- 
tal role; in particular, for Donsker classes, a = 1 gives the exact rate func- 
tion in the moderate deviations bandwidth. Nevertheless it seems difficult to 
reach o = 1 via Talagrand's method [see Panchenko (2001) for more about 
the constants in Talagrand's concentration inequalities for product mea- 
sures]. In order to obtain concentration inequalities more directly, Ledoux 
(1996) used a log-Sobolev type method together with a powerful argument of 
tensorization of the entropy. When applied to exp(tZ), this method yields a 
differential inequality (this is the so-called Herbst argument) on the Laplace 
transform of Z and gives (1.2) again. Applying Ledoux's method, Massart 
(2000) obtained a = 8 in (1.2) with Talagrand's variance factor and a = 4 
in (1.2) with the variance factor V n + 16E(Z). Later on, Rio (2002) proved 
(1.2) for independent and identically distributed (i.i.d.) random variables 
(in the i.i.d. case s l = ■ ■ ■ = s n ) with a = 1, 6 = 3/2 and a variance factor 
v = V n + 2E(Z). Next, Bousquet (2003) found a nice trick to improve Rio's 
inequality. He proved (1.2) with a = b = 1 and the variance factor v in the 
i.i.d. case. For negative values of t, Klein (2002) obtained (1.2) in the i.i.d. 
case with a = 1, 6 = 4 and the same factor v. 

Here we are interested in optimal constants in Talagrand's inequalities 
for nonidentically distributed random variables. Our approach to obtain the 
best constants is to apply the lemma of tensorization of the entropy pro- 
posed by Ledoux (1996). However, the differential inequality on the Laplace 
transform of Z is more involved than in the i.i.d. case. Therefore the results 
are suboptimal in the large deviations bandwidth. We start by right-hand 
side deviations. 

Theorem 1.1. Let S be a countable class of measurable functions with 
values in [— 1,1]". Suppose that E(s k (Xk)) =0 for any s = (s 1 ,...,s n ) in 
S and any integer k in [l,n]. Let L denote the logarithm of the Laplace 
transform of Z . Then, for any positive t, 



(a) L(t) < tE(Z) + |(2E(Z) + K)(exp((e 2 ' - l)/2) - 1). 



Consequently, setting v = 2E(Z) + V n , for any positive x, 



(b) 




and 



F(Z>E{Z)+x) 
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Remark 1.1. In the spirit of Massart's paper (2000), Theorem 1.1(b) 
can be improved for large values of x to get a Bennett type inequality with 
o = l. 

Remark 1.2. Theorem 1.1 applies to set-indexed empirical processes as- 
sociated to nonidentically distributed random variables. In that case s l (Xi) = 
lXi&S ~ ^ S) and consequently the centering constant depends on i. 
Some different concentration inequalities for set-indexed empirical processes 
are given in Rio [(2001), Theorem 4.2 and Remark 4.1]. However, due to 
the concavity of the polynomial function u(l — u), the variance factor in Rio 
(2001) is suboptimal for nonidentically distributed random variables. Here, 
as a by-product of Theorem 1.1(a), we get the upper bound below for the 
variance of Z. 

Corollary 1.1. Under the assumptions of Theorem 1.1(a), VarZ< 
V n + 2E(Z). 

For left-hand side deviations, the concentration bounds are similar. How- 
ever, the proof is more intricate. We emphasize that the proof of Theorem 
1.1 is not relevant for left-hand side deviations. This is the reason why we 
need to compensate the empirical process for left-hand side deviations. 

Theorem 1.2. Under the assumptions of Theorem 1.1, for any positive 

t, 

(a) L(-t)<-tE(Z) + -(e 3t - 3* - 1). 

9 

Consequently, for any positive x, 

(b) P(Z < E(Z) -x)< eX p(-^(^)) , 
where h(x) = (1 + x) log(l + x) — x, and 

(c) P(Z<E(Z)-x) <expf ; X ")<exp( V 

Remark 1.3. Theorem 1.2(b) improves on Theorem 1.1, inequality (2) 
in Klein (2002). However, Klein gives additional results for functions with 
values in ] — oo,l] and subexponential tails on the left [cf. inequality (3), 
Theorem 1.1]. 

Let us now apply Theorems 1.1 and 1.2 to randomized processes, as de- 
fined in Ledoux and Talagrand [(1991), Section 4.3]. Let X%,X2, ■ ■ ■ ,X n be 
a sequence of independent and centered random variables with values in 
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[—1,1]. Let T be some countable set and let (1,(2, ■ ■ ■ ,(n be numerical func- 
tions on T. Let 



The random variable Z corresponds to the class of functions S = {sj : t G T}, 
where the components s\ of st are defined by s\{x) = xQ(t). Assuming that 



V n = sup Vc|(t)E(X|) <oo and M = sup sup |0b(*)l < °°> 
ter k=1 ke[i,n]teT 

Corollary 1.1 gives Var Z < V n + 2E(Z). Let us compare this variance bound 
with the known results. Theorem 3 in Bobkov (1996) applied to Z yields 
Var Z <2V, where 



is Talagrand's variance factor. If the random variables Xi, X 2 , ■ ■ ■ , X n are 
symmetric signs, then Z is the maximum of a Rademacher process and 
V = V n . In that case Corollary 1.1 improves the known bounds on Var Z as 
soon as 2E(Z) < V n . For Rademacher processes, the concentration inequality 
(4.10) in Ledoux and Talagrand (1991) yields 



where mz denotes a median of Z. Theorems 1.1 and 1.2 provide exponential 
bounds with a factor 2 instead of 8. However, our variance factor is greater 
than V n and our bounds are not sub-Gaussian. Finally, we refer the reader to 
Bousquet (2003) or Panchenko (2003) for concentration inequalities (with 
suboptimal variance factor) for randomized or empirical processes in the 
unbounded case. 

2. Tensorization of entropy and related inequalities. In this section we 
apply the method of tensorization of the entropy to get an upper bound 
on the entropy of positive functionals / of independent random variables 
X\ , X2 , . • ■ , X n . 

Notation 2.1. Let T n be the cr-field generated by (X±, . . . ,X n ) and let 
be the cr-field generated by (X±, . . . , X^-i, X^+i, ■ ■ ■ , X n ). Let E\ denote 
the conditional expectation operator associated to 



Z = sup{XiCi(t) + X 2 ( 2 {t) + • • • + X n ( n {t) :teT}. 



n 




(1.4) 



F(Z >m z + x)< exp(-x 2 /(8V n )) 



In this paper, the main tool for proving concentration inequalities is the 
following consequence of the tensorization inequality in Ledoux (1996). 
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Proposition 2.1. Let f be some positive T n -measurable random vari- 
able such that E(/log/) < oo and let g\, g%, . . . ,g n be any sequence of positive 
and integrable random variables such that E(^log^) < oo. Then 

E(/log/)-E(/)logE(/) 

n n 

<Y,^9k\og{g k /E k n g k )) + - g k )\og{f /E k J)). 

k=l k=l 

Proof. Set f k = E^f. By the tensorization inequality in Ledoux (1996), 

n 

(2.1) E(/log/) - E(/) logE(/) < £ E(/log(/// fc )). 

k=l 

Now 

(2.2) E(/log(/// jt )) = E( 5fe log(///,)) + E((/ - flfc ) log(/// fc )). 
Since E*(f/f k ) = 1, we have 

E(g k log(/// fc )) < sup{E( 5fe /i) : /» ^-measurable, ^(e h ) = 1}. 
Hence, from the duality formula for the relative entropy in Ledoux (1996), 

E(0fclog(/// fc )) < E(g k log(g k /E*g k )). 
Together with (2.2), it implies that 

(2.3) E(/log(/// fc ))<E(^log( ffjfc /^ fe ))+E((/- 5fe )log(/// fe )) ) 
and Proposition 2.1 follows. □ 

3. Right-hand side deviations. To prove Theorems 1.1 and 1.2, we start 
by proving the results for a finite class of functions. The results in the count- 
able case are derived from the finite case using the Beppo Levi lemma. Con- 
sequently, throughout the sequel we may assume that S = {s±, . . . , s m }. 

As mentioned in the Introduction, the deviation of Z on the right is 
easier to handle than the deviation on the left. In fact, for positive t, the 
functional exp(tZ) is an increasing and convex function with respect to the 
variables s k (X k ). This is not the case for negative values of t. Consequently, 
upper bounds for the Laplace transform of Z via the Herbst-Ledoux method 
are more difficult to handle for negative values of t. In Section 4, we will 
introduce compensated processes in order to handle the deviation on the 
left. 

Definition 3.1. Let r be the first integer such that Z = S n (s T ). Set 
/ = exp(iZ) and f k = E^(f). Let P% denote the conditional probability 
measure conditionally to T^- 
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Set 



(3.1) 



9k = P n ( T = exp(tS n (si)). 



Let F denote the Laplace transform of Z. From Proposition 2.1 



tF'(t)-F(t)logF(t) 



(3.2) 



< J2 E(5fc \og(g k /E k n g k )) + £ E((/ - g k ) Iog(/// fc )). 



n n 



fc=l fc=l 



Since / — fl^ > 0, the upper bound on the second term in (3.2) will be derived 
from Lemma 3.1. 

Lemma 3.1. With the notation of Definition 3.1, exp(£s£(A" fc )) > (/ / f k ) > 
exp(— 2t) a.s. 

Proof. Let S^(s) = S n (s) - s k (X k ). Let r k be the first integer in [l,m] 
such that 



Since the stopping time T k is ^-measurable, E k (s k - k (X k )) = by the cen- 
tering assumption on the elements of S. It follows that 



(3.5) E k J > exp(tZ k )E k (exp(ts k T (X k ))) > exp(tZ k ) > exp(tS k (s T )). 



Hence f k > /exp(— ts k (X k )), which implies the left-hand side inequality in 
Lemma 3.1. 

We now prove the second inequality in Lemma 3.1. From the left-hand 
side inequality in (3.4), exp(tZ k + t) > E k (f). Next, from the right-hand 
side inequality in (3.4), exp(tZ^) < exp{tZ + t). Hence f k < fexp(2t), which 
implies the second part of Lemma 3.1. □ 

From Lemma 3.1 and the facts that / — g k > and ts k (X k ) < t we get 




that 




in 



(3.7) 
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The random variable h k is positive and .^-measurable. Hence, from the 
variational definition of the relative entropy [cf. Ledoux (1996), page 68], 

E k (g k log(g k /E k g k )) < E k {g k \og{g k /h k ) - g k + h k ). 

Putting this inequality in (3.2) and using (3.6), we get 

tF' - F log F 

(3.8) JL JL 

< 53 E(g k \og{g k /h k ) + (1 + t)(h k - g k )) + 1 ^ E(/ - h k ). 

k=l k=l 

In order to bound up the second term on the right-hand side, we will use 
Lemma 3.2. 

Lemma 3.2. Let (h k ) k < n be the finite sequence of random variables de- 
fined in (3.7). Then 



J2Hf-h k )<e 2t F(t)logF(t). 



k=l 



Proof. Since the random variables S k (s) are ^-measurable, 
h k = E*( jr l T=l exp(tS*( Si ))) = ££(exp(tS*(8 T )))- 



i=l 



It follows that 



(3.9) ^ E(/ - h k ) = 5] E(/(l - exp(-ts k T (X k )) - e 2t ts k T (X k ))) + te 2t F'(t). 

k=l k=l 

Now, from Lemma 3.1, ts^(X fc ) > log(/// fc ) > —2t. Since 1 - exp(— x) - e 2t x 
is a nonincreasing function of x on the interval [— 2t, +oo[, it follows that 

E(/(l - exp(-ts k T (X k )) - te 2t s k T (Xk))) < E(/ - f k - e 2t /log(/ / f k )). 

From the equality E(/fc) = E(/), we get that 

E(/ - f k - e 2t flog(f/f k )) = -e 2 'E(/log(/// fc )). 

Hence, summing on k and applying (2.1), 

53 E(/(l - eM-ts k T (X k )) - te 2t s k (X k ))) < e 2t (F\ogF - tF'), 
k=l 

which, together with (3.9), implies Lemma 3.2. □ 

Next, we bound up the first term on the right-hand side in (3.8). 
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Definition 3.2. Let r(t,x) = xlogx + (1 + t)(l — x). 

With the above definition 

9klog(g k /h k ) + (l + t)(hk -9k) = h k r(t,g k /h k ). 
From the convexity of r with respect to x, 

h k r{t,g k /h k ) <J2 P n( T = i)ew(tS^ Sl ))r(t,exp(ts^X k ))), 

i 

which ensures that 

(3.10) ^(/ lfc r(t,< 7fc // lfc ))<^P n fe (r = i)exp(t^( Si ))E(r(t,exp(^(^))))- 

i 

Here we need the bound below. 

Lemma 3.3. Let r be the function defined in Definition 3.2. For any 
function s in S and any positive t, 

Er(t,eMts k (Xk))) < ^E(s k (X k )) 2 . 

Proof. Let 77(2;) = r(t, e tx ) = txe tx + (t + 1)(1 — e tx ). We will prove that, 
for any x < 1 , 

(3.11) 7](x) < xn'(0) + {txf/2. 

Set 5(x) = 7](x) - xn'(0) - (tx) 2 /2. Then 5{0) = and 5'(x) = t 2 {x- l)(e tx - 
1). Consequently, 6'(x) has the same sign as x(x — 1), which leads to (3.11). 
Since the random variables s k (X k ) are centered, taking x = s k (x k ) and in- 
tegrating with respect to the marginal law of X k , we get Lemma 3.3. □ 

From Lemma 3.3 and (3.10) we have 

(3.12) E k (h k r(t,g k /h k j) < l r=i exp(^( Si ))E(4(Z fe )) 2 V 

Now exp(tS k (si)) < exp(2i + tS n (si)), and therefrom 

n ,2 2t ( n \ 

E(h k r(t, g k /h k )) < — — E £ t T=i exp(tS n ( Si )) £ E(.sf(X fc )) 2 . 

k=l V i k=l ) 

Since Efc E (4( x fc)) 2 < K, we infer that 

n 

(3.13) E ( h kr(t, gk/h k )) < \t 2 e 2t V n F{t). 
k=i 
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Together with Lemma 3.2 and (3.8), (3.13) leads to the differential inequality 

(3.14) tL' - {te 2t + 1)L < t 2 e 2t {V n /2). 

Let j(t) =t" 2 exp((l - e 2 ')/2). Multiplying (3.14) by 7, we get 

(3.15) (t 7 L)' < (V n /2)e 2t exp((l - e 2t )/2). 
Since t^{t) ~ (1/i) as t tends to 0, integrating (3.15) gives 

t*f(t)L(t) < E(Z) + (K/2)(l - exp((l - e 2t )/2)), 

which implies Theorem 1.1(a). 

To prove Theorem 1.1(b), we apply both Markov's inequality to the ran- 
dom variable exp(tZ) and Theorem 1.1(a) with t = \ log(l + 21og(l + x/v )). 

To prove Theorem 1.1(c), we bound up the log-Laplace transform of Z — 
M(Z) via Lemma 3.4 and next we apply Markov's exponential inequality. 

Lemma 3.4. Under the assumptions of Theorem 1.1, for any t in ]0, 2/3[, 
L(t) < tE(Z) + (2E(Z) + V n )^—- 

Proof. From Theorem 1.1(a), it is enough to prove that 
exp((e 2 * - l)/2) < 1 + 2t/(2 - 3t). 
This inequality holds if and only if 

A(t) := log(2 -t)- log(2 - 3t) - (e 2t - l)/2 > 0. 
Expanding A in power series yields X(t) = J2j>2 bjt J where 

b 3 = (j - l)!((3/2y - (1/2Y) - > 2(j - 1)1 - V- 1 > 0. 
Hence A(i) > 0, which implies Lemma 3.4. □ 

Theorem 1.1(c) follows from Lemma 3.4 by noting that the Legendre 
transform of the function t — > t 2 /{2 — 3i) (here t < 2/3) is equal to |(1 + 
(3x/2) - y/l + 3x). 

4. Compensated empirical processes. In this section we prove Theorem 
1.2. We start by proving Theorem 1.2(a). Throughout the section, t is any 
positive real. For i in {1, . . . , m}, let 

L i (t)=logE(exp(-t5 n (s l )))- 
Let us define the exponentially compensated empirical process T(si,t) by 
(4.1) T(s i ,t) = S n (s i )+t- 1 L i (t). 
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We set 

(4.2) Z t = sup T(si,t) and f t = exp(-tZ t ). 

l<i<m 

Let 

(4.3) F(t)=E(/i)=E(exp(-tZ t )) and A(t) = logF(t). 

Our purpose is to obtain a differential inequality for A via the log-Sobolev 
method. 

Before that, we link the log-Laplace L_z of — Z with A. 

Lemma 4.1. For any positive t, 

L- Z {t) - sup Li (t) < A(t) < min(L_ z (i),0). 

i 

PROOF. By definition of Z t , 

exp(—tZ t ) = exp^inf(— tS n (si) — Li(t)) \ > exp^—tZ — sup L^J . 
Consequently, for any positive t, 

exp(A(i)) > exp — supLj(t) )Eexp(— tZ), 



which gives the first inequality. Next, by definition of Zt, 

exp(A(t)) = E ( inf exp(-tS n ( Si ) - L t (t))^j < E^exp(-t5 n (si) - Li(t))) = 1, 

which ensures that A(t) < 0. Moreover, Li(t) > by the centering assump- 
tion on the random variables S n (s). Hence, 

exp(A(i)) <E^infexp(-tS' n (s i ))^ = E(exp(-iZ)), 

which completes the proof of Lemma 4.1. □ 

Definition 4.1. Let r t denote the first integer i such that Z t = T(si,t), 
where Z t is defined in (4.2). 

Since the random functions T(si,t) are analytic functions of t, the ran- 
dom function ft defined in (4.2) is continuous and piecewise analytic, with 
derivative with respect to t, almost everywhere (a.e.): 

(4.4) f t = -Z t ft - (L'(t) - t- l L rt (t))ft = ~(Zt + tZ[)f u 
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where tZ[ = L' Tt (t) — t~ 1 L n (t) by convention. Consequently, the Fubini the- 
orem applies and 

(4.5) F{t) = l- (\((Z u + uZ' u )f u )du. 

Jo 

Therefrom the function F is absolutely continuous with respect to the 
Lebesgue measure, with a.e. derivative in the sense of Lebesgue 

(4.6) F'(t) = -E((Z t +tZ' t )f t ). 

Moreover, from the elementary lower bound ft > exp(— 2nt), the function 
A = log F is absolutely continuous with respect to the Lebesgue measure, 
with a.e. derivative F'/F if F' is the above defined function. 

Definition 4.2. Let f k = E k f t . 

We now apply Proposition 2.1 to the random function ft. Clearly, 
E(/tlog/t)-E(/ t )logE(/0 

(4.7) 

= E(t 2 Z{f t )+tF'(t) - F{t)logF(t) a.e. 
Hence, applying Proposition 2.1 with / = ft, 

n 

tF' — FlogF < -n^Z'tf) + £E(0fclog(0fe/£&7 fc )) 

k=i 

+ £E(Q 7fe -/)log(/V/)). 

k=l 

Now choose 

(4.9) 9k = Y, P n(n = i) exp(-tS n ( Si ) - L t (t)). 

i 

By definition of Zt, 

exp(-tS n (si) - Li(t)) > exp(-tZt), 

which implies that g\. > /. Therefore the upper bound on the second term 
in (4.8) will be derived from Lemma 4.2. 

Notation 4.1. For sake of brevity, throughout we note r = n and ft- 

Lemma 4.2. Lettp(t) = (exp(2i) + l)/2. Set l ki (t) = logE(exp(-tsf(X fe ))). 
Then a.s. 

(f k /f)<exp(ts k T (X k ) + l kT (t))<m- 
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Proof. Let S k (s) = S n (s) - s k (X k ). Set 

Z k = sup{S*(s) + logE(exp(-tS*(s))) : s G 5}. 
Let Tfc be the first integer in [l,m] such that 

^K) +t- 1 logE(exp(-^( Sr J)) = Z k . 

Clearly 

/t < exp(-tZ k ) exp{-ts k k (X k )) - l kTk {t). 
Since the stopping time T k is ^-measurable, it follows that 

(4.10) E k f t <eM~tZ k ). 
Now, by definition of Z k , 

(4.11) exp(-tZ fe ) < exp(-tZ + ts k (X k ) + / fcr (i)), 

which ensures that (/*//) < exp(ts*(X fc ) + l kr (t)). To conclude the proof 
of Lemma 4.2, recall that E(exp(iX)) < cosh(i), for any centered random 
variable X with values in [—1,1], which implies the second part of Lemma 
4.2. □ 

The next step to bound up the second term on the right-hand side is 
Lemma 4.3. However, due to technical difficulties, we are able to bound up 
this term only on some finite interval. 

Lemma 4.3. Let (g k ) be the finite sequence of random variables defined 
in (4.9). Set <p = iplogif). Let to be the positive solution of the equation tp(t) = 
1. Then, for any t in [0,to[, 

jr Eta,* - /) io g (/V/)) < ( E Hg k logins*)) - ns i°§ /)) . 

Proof. Since the random variables S k (s) are ^-measurable, 

m 

(4.12) E k { 9k ) = £ P„ fe (r = i) exp(-t5*( Si ) - L^t) + l u {t)). 

i=l 

It follows that E k (g k ) = E k (f exp(ts k (X k ) + l kr (t))). Hence 



(4.13) £ E( 5fc - /) = £ E(/(exp(ta*(X fc ) + fc T (*)) - 1)). 

k=l k=l 
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Setting rj k = ts^(X k ) + Zfc r (i), we have 

n n / n \ 

E - /) = E E (/( e% - 1 - w)rik)) + i>m (f E % 

fe=i fc=i V fc=i / 

= e ^(/(e* - 1 - mm)) - mnffogf), 
k=i 

since J2k=iVk = — log/- Now, for x in ]— oo, log ip(t)], the function a; — ► — 
1 — xi/j(t) is nonincreasing. Since log^(t) > % > log(/ fc //) by Lemma 4.2, 
we infer that 



fe=i fe=i 



E - /) < E nf((f h /f) - 1 - m m/V/))) - log /) 
fe=i 

< ^(t) ( E E(/log(/// fe )) - E(/log /) ) . 



\fc=i 

Hence, applying (2.3), we obtain 



E -f)< m e E (^ tosisk/Ebk) 



k=i \fe=i 

(4.14) 



+ (^-/)log(/V/))- E (/log/)). 



Now, from Lemma 4.2 we know that log(/ fc //) < log'^(t). Since — /) > 0, 
it follows that 

EE^-/)log(V 



/ 

<logV(t)E E (5fe-/) 



(4.15) 



fe=i 



+ (»-/) log(j)) -E(/log/)V 

Since 1 — </?(i) > for any t in [0,to[; inequality (4.15) then implies Lemma 
4.3. 

□ 
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From Lemma 4.3 and the differential inequality (4.8) we then get that 
(l-ip)(tF' - FlogF) 

n 

< <pE(t 2 Z' t f - / log/) - E(t 2 Z,7) + J2 K (9klog(g k /E k n g k )), 



k=l 

2 71 



where <p = <p(t). Now from (4.7), E(t 2 Z' t f - / log/) = -tF 1 , whence 

n 

(4.16) tF' - (1 - (p) FlogF < -E{t 2 Z' t f) + J2®(9klog(g k /E*g k )). 

k=l 

Let us now bound up the first term on the right-hand side in (4.16). Set 
w k = {g k /E k g k ). Then 

E k {g k \og{g k /E k g k )) = E*{g k )E*{w k logw k ). 

From (4.12), by convexity of the function xlogx, 

E k n {g k )w k \ogw k < Y,Pn(T = i)(-ts k i (X k ) - l ki (t)) exp(-tS n ( Si ) - U{t)). 

i 

Consequently 

E k {g k \og{g k /E k g k )) 

< J2 Pn(r = i) eM-tS k ( Sl ) - Li{t) + l ki {t)){tl' kl (t) - l kl {t)). 

i 

Since 

J2 Pn(r = i) eM-tS k { Si ) -L l + l ki )(tl' ki - l ki ) 

i 

= E k t T=i exp(-tS k ( Si ) -Li + l ki ){tl' ki - l ki )^j , 

it implies that 

E(g k \og(g k /E k g k )) < E(exp(-tZ t + ts k {X k ) + l kr ){tl' kT - l kr )). 

From the convexity of the functions l k i, we know that tV, — l kr > 0. Hence, 
applying Lemma 4.2, we get 

E(g k log(g k /E k g k )) < mHKr ~ hr)f)- 
Since t 2 Z[ = tL' T — L T , it follows that 

n 

(4.17) - E{t 2 Z' t f) + J2 E(g k log(g k /E k g k )) < ty(t) - l)E((tL' T - L r )f). 

k=l 



CONCENTRATION FOR EMPIRICAL PROCESSES 15 

Both (4.16) and (4.17) yield, for t in [0,t [, 

(4.18) tF' - (1 - ip)F log F < (i;(t) - l)E((tL' T - L T )f). 
Since tL' T — L T < supj(tL- — Li), dividing by F, we infer that 

(4.19) tti - (1 - ip)A < (ip(t) - 1) sup(tL- - U). 

i 

Next we derive an upper bound on tL\ — Li from Lemma 4.4. 

Lemma 4.4. Let Y be a random variable with values in ] — oo,l], such 
that E(y 2 ) < +oo. Then, for any positive t, 

E{tYe tY ) -E(e* y )logE(e ty ) <E(Y 2 ){l + {t- l)e*). 

Proof. From the variational definition of the entropy in Ledoux (1996) 
we know that, for any positive constant c and any positive random variable 
T, 

E(TlogT) - E(T) logE(T) <E(Tlog(T/c) -T + c). 

Taking c = 1 and T = exp(tY), we then get that 

E(tYe tY ) - E(e tY ) logE(e* y ) < E((tY - l)e tY + 1). 

Now, from l'Hopital's rule for monotonicity the function x — > x~ 2 (l + (x — 
l)e x ) is nondecreasing on the real line. Hence, for any positive t, 

{tY - l)e tY + 1 < y 2 (l + (t- l)e*), 

which implies Lemma 4.4. □ 

Let Yk = s![(Xk). From the centering assumption, Eexp(tYfc) > 1. Hence 
we have 

tl' ki {t) - l kl {t) < E(tY k e tYk ) - E(e tYk ) logE(e tyfc ) < (1 + (t - l)e*) Var Y k 
by Lemma 4.4. Since tLi — l\ = Ylkfflki ~ ^*)' ^ ensures that 

(4.20) ^-Zi<K(l + (t-l)e*)- 

Both the above bound and (4.19) lead to the differential inequality below. 

PROPOSITION 4.1. For any t in [0,to[, 

(4.21) th! - (1 - <p)A < \V n {e 2t - 1)(1 + {t- l)e*). 
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It remains to bound up A. Set A(i) = t~ 1 A(t) and 
(4.22) I(t)=[ t ^±du. 



lo u 

Then (Ae 1 )' = t~ 2 (tA' — (1 — ip)A)e I . Consequently, from Proposition 4.1, 

(4.23) (Ae7<i|(e 2 '-l)(l + (t-l)eV- 

Since Ae 1 is absolutely continuous with respect to the Lebesgue measure, 
integrating (4.23) yields 



A(t) < A(e)e /(e) - /( ' ) 



(4.24) 



+ Y± r u ~ 2 (e 2u - 1)(1 + (u - l)e u )e / ( u )- / W du 

2 Je 



for < e < t. The control of the integral on the right-hand side will be done 
via the bounds for (p below, whose proof is carried out in Section 5. 

Lemma 4.5. For any t in [0, t ], t < ip(t) < texp(2t) - (t 2 /2). 

By Lemma 4.5, limo-f(s) = 0. Furthermore, A(e) < e~ 1 L_^(e) by Lemma 
4.1. Therefore 

(4.25) limsupA(e)e 7(£) - /(i) < -E(Z)e~ m . 

Now I(u) — I(t) < (u — t) by Lemma 4.5. Consequently, letting e — > in 
(4.24) and applying (4.25), we get 



(4.26) A(t) < -E(Z)te~ m + hVnte'* f u~ 2 (e 2u - 1)(1 + (u - l)e u )e u 

Jo 



du. 



To bound up we then apply the Bennett bound Li(t) < V n (e t — t — 1) 
together with Lemma 4.1. This yields the Proposition 4.2. 

PROPOSITION 4.2. Lei the function J be defined by 

J(t) = \ T n" 2 (e 2u - 1)(1 + (u - l)e u )e u du 
Jo 

and let I be the function defined in (4.22). For any t in [0, to], 

L- Z (t) + tE(Z) < tE(Z)(l - e~ /(i) ) + K(ie -i J(t) + e* - < - 1). 

To obtain Theorem 1.2(a) for t in [0,to], we bound up the functions ap- 
pearing in Proposition 4.2 via Lemma 4.6, proved in Section 5. 
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Lemma 4.6. For any t in [0,to], 

(a) te~ l J(t) + e t -t-l< g(exp(3t) - 3t - 1), 

(b) t{l-e- I{t) ) < §(exp(3i) — 3* — 1). 



Next, proceeding as in Klein (2002), we prove (a) for t in [t ,+oo[. For 
sake of brevity, set E = E(Z). By Lemma 4.1, for any positive t, 

L^ z {t) + tE < tE + sap Li(t) < tE + V n (e* - 1 - 1) < umax(i/2, e* - 1 - 1). 

Now, let ti be the unique positive solution of the equation e* — t — 1 = t/2. 
ti belongs to [0.76,0.77], whence h > t (note that t £ [0.46,0.47]). If t > h, 
then t/2 < e t — t — 1. In that case 

+ f-E < u(e* - t - 1) < (v/9)(e 3t - 3t - 1), 

which proves (a) for t>t\. 

If t belongs to [to,ti], from the convexity of we have 

L_ z (i) + i£ < -U(e 3t ° - 3t - l)^ 1 -^ + iUi^T" < ^( e3 * " 3 * " 1). 
y ti — lq z ti — lq y 

which completes the proof of Theorem 1.2(a). 

To prove Theorem 1.2(c), we note that, for any t in [0, 1[, 

(4.27) -(e 3t -3t-l)< 



2-2* 



[cf. Rio (2000), page 152]. Theorem 1.2(c) follows from (4.27) via the usual 
Cramer-Chernoff calculation. 

5. Technical tools. In this section, we prove Lemmas 4.5 and 4.6. 

Proof of Lemma 4.5. By definition of tp and <p, 

ip(t) = tip{t) + ip(t) log cosh(t) > t, 

since i[)(t) > 1 for any nonnegative t. Next 

texp(2t) - (t 2 /2) - tp(t) = */>(*) (ttanh(t) - logcosh(t) - (e 2t + l) _1 t 2 ), 

so that Lemma 4.5 holds if p(t) := ttanh(t) — logcosh(t) — t 2 /(e 2t + 1) > 
for t in [0,t ]- Now p(0) = and 

2 cosh 2 (t)p'(t) = 2t- t{e~ 2t + 1) - t 2 = t(l - t - e~ 2t ). 

Since exp(— 2t) <l — t for t in [0, 1 /2] , the above identity ensures that p'(t) > 
on [0, to] (recall that to < 1/2), which implies Lemma 4.5. □ 
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Proof of Lemma 4.6. We start by proving (a). Clearly (a) holds if 

(5.1) a(t) = ge*(e 3 ' - 3t - 1) + e*(l +t-e*)- J(t) > 

for any t in [0,4] [with the convention a(0) =0]. The function a is analytic 
on the real line. To prove (5.1), we then note that a^ l \0) = for i = 1,2. 
Consequently (a) holds if, for t in [0,4], 

(5.2) a (3) (t) = e 3i (-i + (19/3)) - 4e 2t + e 4 (St - 5) + (8/3) > 0. 
Now (5.2) holds if a( 4 )(t) > 0, since a( 3 )(0) > 0. Next 

(3{t) := e~'a (4) (t) = 3e 2t {-2t + 11) - 8e* - 3 

satisfies /3(0) > and, for t in [0,4], 

f3' {t) = 12e 2 *(5 - t) - 8e* > e*(12e* - 8) > 0, 

which ensures that f3{t) > for t in [0,4]. Hence Lemma 4.6(a) holds. 

To prove (b), we apply Lemma 4.5 to bound up the function I(t). This 
gives 

ft p 2t _ 1 f 2 

I(t)< J o (e 2u -u/2)du= e -^---. 

Now, recall to < 1/2. For t in [0,1/2], expanding exp(2t) in entire series 
yields 

(exp(2t) - l)/2 = t + 1 2 + 4t 3 i (2i) fc " 3 < t + t 2 + 4t 3 ^ 1. 

fc^a^- fc>3 

Hence, for t < 1/2, 

(5.3) I(t) <t + \t 2 + (4e - 10)t 3 < i + ft 2 + |t 3 =: 7 (t). 
From (5.3), Lemma 4.6(b) holds if 

d(t) = |(e 3 * - 3t - 1) - 1 + t exp(-7(t)) > 0. 
Now d(0) = d'(0) = and 

d"{t) = 2e-^(e 4t+{3/4)t2+(7/8)t3 - 1 - ^ - f t 2 + ft 3 + ft 4 + f§t 5 ). 
Since 

e 4t+(3/4)^+(7/8) t 3> e 4t> 1 + 4t + 8 ^2 ; 

we have d"(t) > for any positive t. Consequently, d(t) > 0, which implies 
Lemma 4.6(b). □ 

Remerciements. Nous voudrions remercier le rapporteur pour ses sug- 
gestions qui nous ont permis d'ameliorer le Lemme 4.4 et sa preuve ainsi 
que de nombreux points de redaction. 
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